progress rate
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
AutoTool: Efficient Tool Selection for Large Language Model Agents
Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs. However, a major bottleneck in current agent frameworks lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step. In this work, we propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia - the tendency of tool invocations to follow predictable sequential patterns. AutoTool constructs a directed graph from historical agent trajectories, where nodes represent tools and edges capture transition probabilities, effectively modeling the inertia in tool selection. It further integrates parameter-level information to refine tool input generation. By traversing this structured representation, AutoTool efficiently selects tools and their parameters with minimal reliance on LLM inference. Extensive experiments across diverse agent tasks demonstrate that AutoTool reduces inference costs by up to 30% while maintaining competitive task completion rates, offering a practical and scalable enhancement for inference-heavy frameworks. Our work highlights the promise of integrating statistical structure into LLM agent design for greater efficiency without sacrificing performance.
- Workflow (1.00)
- Research Report (1.00)
SkillGen: Learning Domain Skills for In-Context Sequential Decision Making
Ding, Ruomeng, Cheng, Wei, Shao, Minglai, Zhao, Chen
Large language models (LLMs) are increasingly applied to sequential decision-making through in-context learning (ICL), yet their effectiveness is highly sensitive to prompt quality. Effective prompts should meet three principles: focus on decision-critical information, provide step-level granularity, and minimize reliance on expert annotations through label efficiency. However, existing ICL methods often fail to satisfy all three criteria simultaneously. Motivated by these challenges, we introduce SkillGen, a skill-based ICL framework for structured sequential reasoning. It constructs an action-centric, domain-level graph from sampled trajectories, identifies high-utility actions via temporal-difference credit assignment, and retrieves step-wise skills to generate fine-grained, context-aware prompts. We further present a theoretical analysis showing that focusing on high-utility segments supports task identifiability and informs more effective ICL prompt design. Experiments on ALFWorld, BabyAI, and ScienceWorld, using both open-source and proprietary LLMs, show that SkillGen achieves consistent gains, improving progress rate by 5.9%-16.5% on average across models.
- Europe > Austria > Vienna (0.14)
- North America > United States > North Carolina (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments
Lu, Qingyu, Ding, Liang, Cao, Siyi, Liu, Xuebo, Zhang, Kanjian, Zhang, Jinxia, Tao, Dacheng
Agents powered by large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments. However, such agents often suffer from inefficiencies in multi-turn interactions, frequently trapped in repetitive loops or issuing ineffective commands, leading to redundant computational overhead. Instead of relying solely on learning from trajectories, we take a first step toward exploring the early-exit behavior for LLM-based agents. We propose two complementary approaches: 1. an $\textbf{intrinsic}$ method that injects exit instructions during generation, and 2. an $\textbf{extrinsic}$ method that verifies task completion to determine when to halt an agent's trial. To evaluate early-exit mechanisms, we introduce two metrics: one measures the reduction of $\textbf{redundant steps}$ as a positive effect, and the other evaluates $\textbf{progress degradation}$ as a negative effect. Experiments with 4 different LLMs across 5 embodied environments show significant efficiency improvements, with only minor drops in agent performance. We also validate a practical strategy where a stronger agent assists after an early-exit agent, achieving better performance with the same total steps. We will release our code to support further research.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Singapore (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Workflow (1.00)
- Research Report (0.82)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
Rawat, Mrinal, Gupta, Ambuje, Goomer, Rushil, Di Bari, Alessandro, Gupta, Neha, Pieraccini, Roberto
The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan along with the detailed reasoning for the given user input. This plan incrementally incorporates previous steps and tool outputs, refining itself after each step execution until the final response is obtained. Our approach is applicable to both conversational and non-conversational agents. To measure the performance of task-oriented agents comprehensively, we propose a two-level evaluation framework: (1) turn level and (2) end-to-end. Our turn-level evaluation, averaged across five models, shows that our approach, Pre-Act, outperforms ReAct by 70% in Action Recall on the Almita dataset. While this approach is effective for larger models, smaller models crucial for practical applications, where latency and cost are key constraints, often struggle with complex reasoning tasks required for agentic systems. To address this limitation, we fine-tune relatively small models such as Llama 3.1 (8B & 70B) using the proposed Pre-Act approach. Our experiments show that the fine-tuned 70B model outperforms GPT-4, achieving a 69.5% improvement in action accuracy (turn-level) and a 28% improvement in goal completion rate (end-to-end) on the Almita (out-of-domain) dataset.
- North America > United States (0.05)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Workflow (1.00)
- Research Report (1.00)
- Health & Medicine (0.46)
- Information Technology (0.46)
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
Gioacchini, Luca, Siracusano, Giuseppe, Sanvito, Davide, Gashteovski, Kiril, Friede, David, Bifulco, Roberto, Lawrence, Carolin
The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest -- a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.
- Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
Simulation-based Analysis of a Novel Loop-based Road Topology for Autonomous Vehicles
Ramdhan, Stefan, Trandinh, Winnie, Arulmohan, Sathurshan, Hu, Xiayong, Deevy, Spencer, Bandur, Victor, Pantelic, Vera, Lawford, Mark, Wassyng, Alan
The challenges in implementing SAE Level 4/5 autonomous vehicles are manifold, with intersection navigation being a pervasive one. We analyze a novel road topology invented by a co-author of this paper, Xiayong Hu. The topology eliminates the need for traditional traffic control and cross-traffic at intersections, potentially improving the safety of autonomous driving systems. The topology, herein called the Zonal Road Topology, consists of unidirectional loops of road with traffic flowing either clockwise or counter-clockwise. Adjacent loops are directionally aligned with one another, allowing vehicles to transfer from one loop to another through a simple lane change. To evaluate the Zonal Road Topology, a one km2 pilot-track near Changshu, China is currently being set aside for testing. In parallel, traffic simulations are being performed. To this end, we conduct a simulation-based comparison between the Zonal Road Topology and a traditional road topology for a generic Electric Vehicle (EV) using the Simulation for Urban MObility (SUMO) platform and MATLAB/Simulink. We analyze the topologies in terms of their travel efficiency, safety, energy usage, and capacity. Drive time, number of halts, progress rate, and other metrics are analyzed across varied traffic levels to investigate the advantages and disadvantages of the Zonal Road Topology. Our results indicate that vehicles on the Zonal Road Topology have a lower, more consistent drive time with greater traffic throughput, while using less energy on average. These results become more prominent at higher traffic densities.
- North America > United States (0.93)
- Asia > China (0.24)
- North America > Canada > Ontario > Hamilton (0.14)
- (2 more...)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma, Chang, Zhang, Junlei, Zhu, Zhihao, Yang, Cheng, Yang, Yujiu, Jin, Yaohui, Lan, Zhenzhong, Kong, Lingpeng, He, Junxian
Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assessment of agents for multi-faceted analysis through interactive visualization. This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront. Ultimately, AgentBoard serves as a significant step towards demystifying agent behaviors and accelerating the development of stronger LLM agents.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (4 more...)
- Leisure & Entertainment (0.67)
- Media (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
On averaging the best samples in evolutionary computation
Meunier, Laurent, Chevaleyre, Yann, Rapin, Jeremy, Royer, Clément W., Teytaud, Olivier
Choosing the right selection rate is a long standing issue in evolutionary computation. In the continuous unconstrained case, we prove mathematically that a single parent $\mu=1$ leads to a sub-optimal simple regret in the case of the sphere function. We provide a theoretically-based selection rate $\mu/\lambda$ that leads to better progress rates. With our choice of selection rate, we get a provable regret of order $O(\lambda^{-1})$ which has to be compared with $O(\lambda^{-2/d})$ in the case where $\mu=1$. We complete our study with experiments to confirm our theoretical claims.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)